257 research outputs found

    Radiation-Induced Error Criticality in Modern HPC Parallel Accelerators

    Get PDF
    In this paper, we evaluate the error criticality of radiation-induced errors on modern High-Performance Computing (HPC) accelerators (Intel Xeon Phi and NVIDIA K40) through a dedicated set of metrics. We show that, as long as imprecise computing is concerned, the simple mismatch detection is not sufficient to evaluate and compare the radiation sensitivity of HPC devices and algorithms. Our analysis quantifies and qualifies radiation effects on applications’ output correlating the number of corrupted elements with their spatial locality. Also, we provide the mean relative error (dataset-wise) to evaluate radiation-induced error magnitude. We apply the selected metrics to experimental results obtained in various radiation test campaigns for a total of more than 400 hours of beam time per device. The amount of data we gathered allows us to evaluate the error criticality of a representative set of algorithms from HPC suites. Additionally, based on the characteristics of the tested algorithms, we draw generic reliability conclusions for broader classes of codes. We show that arithmetic operations are less critical for the K40, while Xeon Phi is more reliable when executing particles interactions solved through Finite Difference Methods. Finally, iterative stencil operations seem the most reliable on both architectures.This work was supported by the STIC-AmSud/CAPES scientific cooperation program under the EnergySFE research project grant 99999.007556/2015-02, EU H2020 Programme, and MCTI/RNP-Brazil under the HPC4E Project, grant agreement n° 689772. Tested K40 boards were donated thanks to Steve Keckler, Timothy Tsai, and Siva Hari from NVIDIA.Postprint (author's final draft

    Protecting GPU's Microarchitectural Vulnerabilities via Effective Selective Hardening

    Get PDF
    Graphics Processing Units (GPUs) are today adopted in several domains for which reliability is fundamental, such as self-driving cars and autonomous machines. Unfortunately, on one side GPUs have been shown to have a high error rate and, on the other side, the constraints imposed by real-time safety-critical applications make traditional, costly, replication-based hardening solutions inadequate. This paper proposes an effective microarchitectural selective hardening of GPU modules to mitigate those faults that affect instructions correct execution. We first characterize, through Register-Transfer Level (RTL) fault injections, the architectural vulnerabilities of a GPU model (FlexGripPlus). We specifically target transient faults in the functional units and pipeline registers of a GPU core. Then, we apply selective hardening by triplicating the locations in each module that we found to be more critical. The results show that selective hardening using Triple Modular Redundancy (TMR) can correct 85% to 99% of faults in the pipeline registers and from 50% to 100% of faults in the functional units. The proposed selective TMR strategy reduces the hardware overhead by up to 65% when compared with traditional TMR

    Revealing GPUs Vulnerabilities by Combining Register-Transfer and Software-Level Fault Injection

    Get PDF
    The complexity of both hardware and software makes GPUs reliability evaluation extremely challenging. A low level fault injection on a GPU model, despite being accurate, would take a prohibitively long time (months to years), while software fault injection, despite being quick, cannot access critical resources for GPUs and typically uses synthetic fault models (e.g., single bit-flips) that could result in unrealistic evaluations. This paper proposes to combine the accuracy of Register-Transfer Level (RTL) fault injection with the efficiency of software fault injection. First, on an RTL GPU model (FlexGripPlus), we inject over 1.5 million faults in low-level resources that are unprotected and hidden to the programmer, and characterize their effects on the output of common instructions. We create a pool of possible fault effects on the operation output based on the instruction opcode and input characteristics. We then inject these fault effects, at the application level, using an updated version of a software framework (NVBitFI). Our strategy reduces the fault injection time from the tens of years an RTL evaluation would need to tens of hours, thus allowing, for the first time on GPUs, to track the fault propagation from the hardware to the output of complex applications. Additionally, we provide a more realistic fault model and show that single bit-flip injection would underestimate the error rate of six HPC applications and two convolutional neural networks by up to 48parcent (18parcent on average). The RTL fault models and the injection framework we developed are made available in a public repository to enable third-party evaluations and ease results reproducibility

    The evolution of dam-litter microbial flora from birth to 60 days of age

    Get PDF
    BACKGROUND: Early bacterial colonization in puppies is still a poorly understood phenomenon. Although the topic is of considerable interest, a big gap in knowledge still exists on the understanding of timing and features of neonatal gut colonization. Thence, the purpose of this study was to evaluate the relationship between dam and litter microbial flora, in vaginally delivered puppies, from birth to two months of age. Bacteria were identified using MALDI-TOF, an accurate and sensitive method, and cluster analysis of data provided a new insight on the investigated topic. METHODS: Six dam-litter units of two medium size breeds were enrolled in the study. Vaginal and colostrum/milk samples were collected from dams after delivery and 48h post-partum, while rectal samples were taken from dams and puppies after delivery and at day 2, 30 and 60 (T2, T30 and T60, respectively) post-partum. Bacterial isolation and identification were performed following standard techniques, then the data were analyzed using a new approach based on bacterial genus population composition obtained using a wide MALDI-TOF screening and cluster analysis. RESULTS: Forty-eight bacteriological samples were collected from the dams and 145 from their 42 puppies. Colostrum/milk samples (n = 12) showed a bacterial growth mainly limited to few colonies. Staphylococci, Enterococci, E. coli, Proteus spp. were most frequently isolated. All vaginal swabs (n = 12) resulted in bacteria isolation (medium to high growth). Streptococci, Enterococci, E. coli were the most frequently detected. E. coli, Proteus mirabilis, Enterococcus spp., Streptococcus spp. were often obtained from dams’ and puppies’ rectal swabs. Clostridia, not isolated in any other sampling site, were rarely found (n = 3) in meconium while they were more frequently isolated at later times (T2: n = 30; T30: n = 17; T60: n = 27). Analysis of the bacterial genus pattern over time showed a statistically significant reduction (P < 0.01) in the heterogeneity of microbial composition in all time points if compared to birth for each dam-litter unit. These results were confirmed with cluster analysis and two-dimensional scaling. CONCLUSION: This novel data analysis suggests a fundamental role of the individual dam in seeding and shaping the microbiome of the litter. Thus, modulating the dam’s microbiota may positively impact the puppy microbiota and benefit their health. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12917-022-03199-3

    HIPE: HMC Instruction Predication Extension Applied on Database Processing

    Get PDF
    The recent Hybrid Memory Cube (HMC) is a smart memory which includes functional units inside one logic layer of the 3D stacked memory design. In order to execute instructions inside the Hybrid Memory Cube (HMC), the processor needs to send instructions to be executed near data, keeping most of the pipeline complexity inside the processor. Thus, control-flow and data-flow dependencies are all managed inside the processor, in such way that only update instructions are supported by the HMC. In order to solve data-flow dependencies inside the memory, previous work proposed HMC Instruction Vector Extensions (HIVE), which embeds a high number of functional units with a interlock register bank. In this work we propose HMC Instruction Prediction Extensions (HIPE), that supports predicated execution inside the memory, in order to transform control-flow dependencies into data-flow dependencies. Our mechanism focus on removing the high latency iteration between the processor and the smart memory during the execution of branches that depends on data processed inside the memory. In this paper we evaluate a balanced design of HIVE comparing to x86 and HMC executions. After we show the HIPE mechanism results when executing a database workload, which is a strong candidate to use smart memories. We show interesting trade-offs of performance when comparing our mechanism to previous work
    • …
    corecore